Add `truncated_rows` parameter to `register_csv()` and `read_csv()` by djouallah · Pull Request #1359 · apache/datafusion-python

djouallah · 2026-01-31T02:03:06Z

Summary

Exposes the truncated_rows parameter from DataFusion Rust to Python bindings for register_csv() and read_csv() methods. This parameter enables reading CSV files with inconsistent column counts by creating a union schema and filling missing columns with nulls.

Background

The truncated_rows feature was added to DataFusion Rust in apache/datafusion#17553 (merged October 8, 2025) and is available in DataFusion 51.0.0.

Current workaround: Users can already use truncated_rows via SQL with external tables:

ctx.sql("""
    CREATE EXTERNAL TABLE mixed
    STORED AS CSV
    LOCATION 'file1.csv', 'file2.csv'
    OPTIONS ('truncated_rows' 'true')
""")

Problem: SQL LOCATION clause does not support lists of file paths as separate arguments :(

Solution: register_csv() and read_csv() accept Python lists of paths, making it much more ergonomic:

# Much cleaner API!
ctx.register_csv(
    "mixed",
    ["file1.csv", "file2.csv", "file3.csv"],
    truncated_rows=True
)

Changes

✅ Add truncated_rows: bool = False parameter to SessionContext.register_csv()
✅ Add truncated_rows: bool = False parameter to SessionContext.read_csv()
✅ Update Rust PyO3 bindings in src/context.rs
✅ Update Python wrappers in python/datafusion/context.py
✅ Add tests verifying parameter acceptance
✅ Update docstrings with parameter documentation

Example Usage

from datafusion import SessionContext

ctx = SessionContext()

# Register multiple CSV files with different schemas
ctx.register_csv(
    "services",
    ["services_2024.csv", "services_2025.csv"],  # Different column counts
    truncated_rows=True  # Create union schema, fill missing columns with nulls
)

# Query across files with different schemas
result = ctx.sql("SELECT * FROM services").collect()

Testing

Tests verify that the truncated_rows parameter is accepted by the Python bindings. The actual behavior of the feature is tested in the upstream DataFusion repository.

This follows the principle that Python bindings should expose all Rust API parameters, and behavior testing is the responsibility of the upstream DataFusion library.

Backward Compatibility

✅ Non-breaking change. The parameter defaults to False, maintaining existing behavior.

Exposes the truncated_rows parameter from Rust DataFusion to Python bindings. This enables reading CSV files with inconsistent column counts by creating a union schema and filling missing columns with nulls. The parameter was added to DataFusion Rust in PR apache/datafusion#17553 and is now available in datafusion 51.0.0. Changes: - Add truncated_rows parameter to SessionContext.register_csv() - Add truncated_rows parameter to SessionContext.read_csv() - Add comprehensive tests for both methods - Update docstrings with parameter documentation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The tests now only verify that the truncated_rows parameter is accepted by the Python bindings, not the actual behavior. Behavior testing is an upstream DataFusion concern (apache/datafusion#17553). This follows the principle that Python bindings should expose all Rust API parameters regardless of upstream implementation status.

timsaucer · 2026-01-31T10:45:27Z

Thank you for the PR. How would you feel about making a more general solution as described in #1358 ? If we're updating this, we could ensure we have all of the options exposed to our users.

djouallah · 2026-01-31T10:51:51Z

@timsaucer i am not comfortable yet with this whole thing, I know what you want :) and it make perfect sense, but i don't want to get too excited and do silly thing yet :)

timsaucer · 2026-01-31T13:20:22Z

@timsaucer i am not comfortable yet with this whole thing, I know what you want :) and it make perfect sense, but i don't want to get too excited and do silly thing yet :)

Ok, understood. I think it would be better if we added a more general solution instead of just adding one piece at a time, though. Maybe I will take a swing at this later today.

timsaucer · 2026-02-01T15:38:03Z

What do you think about going with something more like #1361

djouallah · 2026-02-01T15:43:15Z

Thanks that a better solution 😃

djouallah and others added 4 commits January 31, 2026 10:49

trigger CI

aac8ff0

Fix import ordering for ruff linter

f61b184

djouallah mentioned this pull request Jan 31, 2026

Expose all CSV read options #1358

Open

djouallah closed this Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `truncated_rows` parameter to `register_csv()` and `read_csv()`#1359

Add `truncated_rows` parameter to `register_csv()` and `read_csv()`#1359
djouallah wants to merge 4 commits intoapache:mainfrom
djouallah:feat/add-truncated-rows-csv-parameter

djouallah commented Jan 31, 2026 •

edited

Loading

Uh oh!

timsaucer commented Jan 31, 2026

Uh oh!

djouallah commented Jan 31, 2026

Uh oh!

timsaucer commented Jan 31, 2026

Uh oh!

timsaucer commented Feb 1, 2026

Uh oh!

djouallah commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

djouallah commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Example Usage

Testing

Backward Compatibility

Related

Uh oh!

timsaucer commented Jan 31, 2026

Uh oh!

djouallah commented Jan 31, 2026

Uh oh!

timsaucer commented Jan 31, 2026

Uh oh!

timsaucer commented Feb 1, 2026

Uh oh!

djouallah commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djouallah commented Jan 31, 2026 •

edited

Loading